Includes material from Ian Dworkin and Jonathan Dushoff, but they bear no responsibility for the contents.

Simplified Bayes workflow

  1. Decide on model specification; journal it, along with justifications
  2. Decide on priors, including prior predictive simulations
  3. Data exploration (graphical)
  4. Fit model
  5. MCMC diagnostics
  6. Model diagnostics, including comparing posterior predictive sims to data
  7. Interpret/predict/etc.

Examples

‘sleepstudy’ data

From ?lme4::sleepstudy, “The average reaction time per day (in milliseconds) for subjects in a sleep deprivation study.” A standard example for mixed models.

library(lme4)
library(brms)
options(brms.backend = "cmdstanr")
library(broom.mixed)
library(purrr)
library(dplyr)
library(tidybayes)
library(bayesplot)
library(bayestestR)
library(ggplot2); theme_set(theme_bw())
library(ggrastr)
library(cowplot)

We’ll fit the model Reaction ~ Days + (Days|Subject) (a linear regression + random-slopes model)

\[ \newcommand{\XX}{\mathbf X} \newcommand{\ZZ}{\mathbf Z} \newcommand{\bbeta}{\boldsymbol \beta} \newcommand{\bb}{\mathbf b} \newcommand{\zero}{\boldsymbol 0} \begin{split} \textrm{Reaction} & \sim \textrm{Normal}(\XX \bbeta + \ZZ \bb, \sigma^2) \\ \bb & \sim \textrm{MVNorm}(\zero, \Sigma) \\ \Sigma & = \textrm{block-diag}, \left( \begin{array}{cc} \sigma^2_i & \sigma_{is} \\ \sigma_{is} & \sigma^2_{s} \end{array} \right) \end{split} \]

form1 <- Reaction ~ Days + (Days|Subject)

prior predictive simulation

brms has a convenience function get_prior() that lays out the parameters/priors that need to be set, and their default values …

get_prior(form1, sleepstudy)
##                      prior     class      coef   group resp dpar nlpar lb ub       source
##                     (flat)         b                                              default
##                     (flat)         b      Days                               (vectorized)
##                     lkj(1)       cor                                              default
##                     lkj(1)       cor           Subject                       (vectorized)
##  student_t(3, 288.7, 59.3) Intercept                                              default
##      student_t(3, 0, 59.3)        sd                                    0         default
##      student_t(3, 0, 59.3)        sd           Subject                  0    (vectorized)
##      student_t(3, 0, 59.3)        sd      Days Subject                  0    (vectorized)
##      student_t(3, 0, 59.3)        sd Intercept Subject                  0    (vectorized)
##      student_t(3, 0, 59.3)     sigma                                    0         default

Info on brms default priors: Intercept is Student \(t\) with 3 df, mean equal to observed mean (median??), SD equal to (rescaled) MAD. RE SDs are the same but half-\(t\) (see lb = 0 column), mode at 0.

Constraining intercept and slope to ‘reasonable’ values:

b_prior <- c(set_prior("normal(200, 50)", "Intercept"),
             set_prior("normal(0, 10)", "b")
             )

Helper function for prior predictive simulations: run a short MCMC chain, plot results. (We’re going to ignore/suppress warnings for this stage …)

test_prior <- function(p) {
    ## https://discourse.mc-stan.org/t/suppress-all-output-from-brms-in-markdown-files/30117/3
    capture.output(
        b <- brm(form1, sleepstudy, prior = p,
                 seed = 101,              ## reproducibility
                 sample_prior = 'only',   ## for prior predictive sim
                 chains = 1, iter = 500,  ## very short sample for convenience
                 silent = 2, refresh = 0  ## be vewy vewy quiet ...
                 )
    )
    p_df <- sleepstudy |> add_predicted_draws(b)
    ## 'spaghetti plot' of prior preds
    gg0 <- ggplot(p_df,aes(Days, .prediction, group=interaction(Subject,.draw))) +
        geom_line(alpha = 0.1)
    print(gg0)
    invisible(b) ## return without printing
}
test_prior(b_prior)

Decrease random-effects SDs:

b_prior2 <- c(set_prior("normal(200, 10)", "Intercept"),
              set_prior("normal(0, 5)", "b"),
              set_prior("student_t(3, 0, 0.1)", "sd")
              )
test_prior(b_prior2)

(??why is this not noisy due to large sigma??)

Make all scales even smaller?

b_prior3 <- c(set_prior("normal(200, 5)", "Intercept"),
              set_prior("normal(0, 2)", "b"),
              set_prior("student_t(3, 0, 0.01)", "sd")
             )
test_prior(b_prior3)

We forgot to constrain the prior for the residual standard deviation!

b_prior4 <- c(set_prior("normal(200, 5)", "Intercept"),
              set_prior("normal(0, 2)", "b"),
              set_prior("normal(0, 1)", "sd"),
              set_prior("normal(0, 1)", "sigma")
             )
test_prior(b_prior4)

Now relax a bit …

b_prior5 <- c(set_prior("normal(200, 10)", "Intercept"),
              set_prior("normal(0, 8)", "b"),
              set_prior("student_t(10, 0, 3)", "sd"),
              set_prior("student_t(10, 0, 3)", "sigma")
             )
test_prior(b_prior5)

In hindsight, should relax more. Set intercept back to default value, widen fixed-effect (slope) prior

b_prior6 <- c(set_prior("normal(0, 20)", "b"),
              set_prior("student_t(10, 0, 3)", "sd"),
              set_prior("student_t(10, 0, 3)", "sigma")
             )
test_prior(b_prior6)

fitting

m_lmer <- lmer(form1, sleepstudy)
b_reg <- brm(form1, sleepstudy, prior = b_prior5,
             seed = 101,               ## reproducibility
             ## handle divergence
        control = list(adapt_delta = 0.95)
        )
b_default <- brm(form1, sleepstudy,
        seed = 101,              ## reproducibility
        )
load("examples1.rda")

diagnose

print(bayestestR::diagnostic_posterior(b_reg),
      digits = 4)
##     Parameter  Rhat  ESS    MCSE
## 1      b_Days 1.001 1185 0.08135
## 2 b_Intercept 1.002 1152 0.28076
summary(b_reg)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Reaction ~ Days + (Days | Subject) 
##    Data: sleepstudy (Number of observations: 180) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~Subject (Number of levels: 18) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)          20.69      7.76     6.17    37.31 1.00      865      577
## sd(Days)               10.89      2.68     6.51    16.76 1.00     1212     2037
## cor(Intercept,Days)     0.50      0.25    -0.04     0.91 1.00      703      786
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept   233.85      9.53   213.16   250.81 1.00     1064      934
## Days         -0.42      2.80    -5.97     4.76 1.00     1224     1832
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    25.69      1.63    22.87    29.21 1.00     2231     2618
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

\(\hat R\) (Rhat), ESS look OK. Vehtari et al. (2021) recommend an \(\hat R\) threshold of 1.01, ESS > 400, MCSE (Monte Carlo standard error) ‘small enough’

figure out what is the needed accuracy for the quantity of interest (for reporting usually 2 significant digits is enough

regex_pars specifies a regular expression for deciding which parameters to show

mcmc_trace(b_reg, regex_pars= "b_|sd_")

Vehtari et al. (2021) recommend rank-histograms instead: chains should have similar rank distributions

mcmc_rank_overlay(b_reg, regex_pars= "b_|sd_")

Plot results. broom.mixed::tidy() will get what you need - complicated code below is to get everything lined up nicely.

brms_modlist <- list(brms_default = b_default, brms_reg = b_reg, brms_reg2 = b_reg2)
res_bayes <- (brms_modlist
    |> purrr::map_dfr(~ tidy(., conf.int = TRUE), .id = "model")
)
## need to do separately - different conf.method choices
res_lmer <- (m_lmer
    |> tidy(conf.int = TRUE, conf.method = "profile")
    |> mutate(model = "lmer", .before = 1)
)
## Computing profile confidence intervals ...
## Computing profile confidence intervals ...
res <- (bind_rows(res_lmer, res_bayes)
    |> select(-c(std.error, statistic, component, group))
    |> filter(term != "(Intercept)")
    |> mutate(facet = ifelse(grepl("^cor", term), "cor",
                      ifelse(grepl("Days", term), "Days",
                             "int")))
)
ggplot(res, aes(estimate, term, colour = model, shape = model)) +
    geom_pointrange(aes(xmin = conf.low, xmax = conf.high),
                    position = position_dodge(width = 0.5)) +
    facet_wrap(~ facet, scale = "free",ncol = 1) +
    guides(colour = guide_legend(reverse=TRUE),
           shape = guide_legend(reverse=TRUE))

posterior predictive simulations, compare with data

post_df1 <- sleepstudy |> add_predicted_draws(b_reg)
gg1 <- ggplot(post_df1,
              aes(Days, .prediction, group=interaction(Subject,.draw))) +
    rasterise(geom_line(alpha = 0.1)) +
    geom_point(aes(y=Reaction), col = "red")
print(gg1)

post_df2 <- sleepstudy |> add_predicted_draws(b_reg2)
print(gg1 %+% post_df2)

post_df1B <- filter(post_df1, Subject == levels(Subject)[1])
post_df2B <- filter(post_df2, Subject == levels(Subject)[1])
plot_grid(gg1 %+% post_df1B, gg1 %+% post_df2B)

list(bad = b_reg, good = b_reg2) |>
    purrr::map_dfr(tidy, effects = "fixed", .id = "priors")
## # A tibble: 4 × 8
##   priors effect component term        estimate std.error conf.low conf.high
##   <chr>  <chr>  <chr>     <chr>          <dbl>     <dbl>    <dbl>     <dbl>
## 1 bad    fixed  cond      (Intercept)  234.         9.53   213.      251.  
## 2 bad    fixed  cond      Days          -0.418      2.80    -5.97      4.76
## 3 good   fixed  cond      (Intercept)  251.         4.88   242.      261.  
## 4 good   fixed  cond      Days          10.4        1.50     7.55     13.5

¿¿¿¿ I can see that b_prior5/b_reg give “wrong” answers (priors are too tight), but I would like to be able to diagnose that from the posterior predictive plots. I can’t see any difference between these and the b_prior6/b_reg2 PostPD plots! What am I missing ???

Plot random effect draws, from here:

Random effects (deviations from population-level/fixed-effect predictions):

brms_modlist <- list(brms_default = b_default, brms_reg = b_reg, brms_reg2 = b_reg2)
ranefs <- (brms_modlist
    |> purrr::map_dfr(~ tidy(., effects = "ran_vals", conf.int = TRUE), .id = "model")
)
gg_r <- ggplot(ranefs, aes(estimate, level, colour = model)) +
    geom_pointrange(aes(xmin = conf.low, xmax = conf.high), position = position_dodge(width = 0.5)) +
    facet_wrap(~term, scale = "free", nrow = 1)
print(gg_r +  geom_vline(lty = 2, xintercept = 0))

Group-level coefficients (predicted intercept/slope for each subject):

## this is way too ugly - needs to be incorporated into broom.mixed as an option
my_coefs <- function(x) {
    meltfun <- function(a) {
        dd <- as.data.frame(ftable(a)) |>  
            setNames(c("level", "var", "term", "value")) |>
            tidyr::pivot_wider(names_from = var, values_from = value) |>
            rename(estimate = "Estimate",
                   std.error = "Est.Error",
                   ## FIXME: not robust to changing levels
                   conf.low = "Q2.5",
                   conf.high = "Q97.5")
    }
    purrr:::map_dfr(coef(x), meltfun, .id = "group")
}
coefs <- (brms_modlist
    |> purrr::map_dfr(my_coefs, .id = "model")
)
print(gg_r %+% coefs)

check_prior(b_reg)
##     Parameter Prior_Quality
## 1 b_Intercept   informative
## 2      b_Days   informative
try(check_prior(b_reg2)) ## ugh
## Error in if (stats::sd(posterior, na.rm = TRUE) > 0.1 * stats::sd(prior,  : 
##   missing value where TRUE/FALSE needed
try(check_prior(b_reg2, method = "lakeland")) ## ugh
##     Parameter Prior_Quality
## 1 b_Intercept   informative
## 2      b_Days   informative

References

Vehtari, Aki, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian Bürkner. 2021. “Rank-Normalization, Folding, and Localization: An Improved R-hat for Assessing Convergence of MCMC (with Discussion).” Bayesian Analysis 16 (2): 667–718. https://doi.org/10.1214/20-BA1221.


Last updated: 31 May 2023 17:04